192 research outputs found

    Detecting Identity by Descent and Estimating Genotype Error Rates in Sequence Data

    Get PDF
    Existing methods for identity by descent (IBD) segment detection were designed for SNP array data, not sequence data. Sequence data have a much higher density of genetic variants and a different allele frequency distribution, and can have higher genotype error rates. Consequently, best practices for IBD detection in SNP array data do not necessarily carry over to sequence data. We present a method, IBDseq, for detecting IBD segments in sequence data and a method, SEQERR, for estimating genotype error rates at low-frequency variants by using detected IBD. The IBDseq method estimates probabilities of genotypes observed with error for each pair of individuals under IBD and non-IBD models. The ratio of estimated probabilities under the two models gives a LOD score for IBD. We evaluate several IBD detection methods that are fast enough for application to sequence data (IBDseq, Beagle Refined IBD, PLINK, and GERMLINE) under multiple parameter settings, and we show that IBDseq achieves high power and accuracy for IBD detection in sequence data. The SEQERR method estimates genotype error rates by comparing observed and expected rates of pairs of homozygote and heterozygote genotypes at low-frequency variants in IBD segments. We demonstrate the accuracy of SEQERR in simulated data, and we apply the method to estimate genotype error rates in sequence data from the UK10K and 1000 Genomes projects

    Nucleotide-binding oligomerization domain containing 1 (NOD1) haplotypes and single nucleotide polymorphisms modify susceptibility to inflammatory bowel diseases in a New Zealand caucasian population: a case-control study

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The nucleotide-binding oligomerization domain containing 1 (<it>NOD1</it>) gene encodes a pattern recognition receptor that senses pathogens, leading to downstream responses characteristic of innate immunity. We investigated the role of <it>NOD1 </it>single nucleotide polymorphisms (SNPs) on IBD risk in a New Zealand Caucasian population, and studied Nod1 expression in response to bacterial invasion in the Caco2 cell line.</p> <p>Findings</p> <p>DNA samples from 388 Crohn's disease (CD), 405 ulcerative colitis (UC), 27 indeterminate colitis patients and 201 randomly selected controls, from Canterbury, New Zealand were screened for 3 common SNPs in <it>NOD1</it>, using the MassARRAY<sup>¼ </sup>iPLEX Gold assay. Transcriptional activation of the protein produced by <it>NOD1 </it>(Nod1) was studied after infection of Caco2 cells with <it>Escherichia coli </it>LF82. Carrying the rs2075818 G allele decreased the risk of CD (OR = 0.66, 95% CI = 0.50–0.88, p < 0.002) but not UC. There was an increased frequency of the three SNP (rs2075818, rs2075822, rs2907748) haplotype, CTG (p = 0.004) and a decreased frequency of the GTG haplotype (p = 0.02).in CD. The rs2075822 CT or TT genotypes were at an increased frequency (genotype p value = 0.02), while the rs2907748 AA or AG genotypes showed decreased frequencies in UC (p = 0.04), but not in CD. Functional assays showed that Nod1 is produced 6 hours after bacterial invasion of the Caco2 cell line.</p> <p>Conclusion</p> <p>The <it>NOD1 </it>gene is important in signalling invasion of colonic cells by pathogenic bacteria, indicative of its' key role in innate immunity. Carrying specific SNPs in this gene significantly modifies the risk of CD and/or UC in a New Zealand Caucasian population.</p

    Interactions among genes in the ErbB-Neuregulin signalling network are associated with increased susceptibility to schizophrenia

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Evidence of genetic association between the NRG1 (Neuregulin-1) gene and schizophrenia is now well-documented. Furthermore, several recent reports suggest association between schizophrenia and single-nucleotide polymorphisms (SNPs) in ERBB4, one of the receptors for Neuregulin-1. In this study, we have extended the previously published associations by investigating the involvement of all eight genes from the ERBB and NRG families for association with schizophrenia.</p> <p>Methods</p> <p>Eight genes from the ERBB and NRG families were tested for association to schizophrenia using a collection of 396 cases and 1,342 blood bank controls ascertained from Aberdeen, UK. A total of 365 SNPs were tested. Association testing of both alleles and genotypes was carried out using the fast Fisher's Exact Test (FET). To understand better the nature of the associations, all pairs of SNPs separated by ≄ 0.5 cM with at least nominal evidence of association (<it>P </it>< 0.10) were tested for evidence of pairwise interaction by logistic regression analysis.</p> <p>Results</p> <p>42 out of 365 tested SNPs in the eight genes from the ERBB and NRG gene families were significantly associated with schizophrenia (<it>P </it>< 0.05). Associated SNPs were located in ERBB4 and NRG1, confirming earlier reports. However, novel associations were also seen in NRG2, NRG3 and EGFR. In pairwise interaction tests, clear evidence of gene-gene interaction was detected for NRG1-NRG2, NRG1-NRG3 and EGFR-NRG2, and suggestive evidence was also seen for ERBB4-NRG1, ERBB4-NRG2, ERBB4-NRG3 and ERBB4-ERBB2. Evidence of intragenic interaction was seen for SNPs in ERBB4.</p> <p>Conclusion</p> <p>These new findings suggest that observed associations between NRG1 and schizophrenia may be mediated through functional interaction not just with ERBB4, but with other members of the NRG and ERBB families. There is evidence that genetic interaction among these loci may increase susceptibility to schizophrenia.</p

    Performance of Genotype Imputation for Rare Variants Identified in Exons and Flanking Regions of Genes

    Get PDF
    Genotype imputation has the potential to assess human genetic variation at a lower cost than assaying the variants using laboratory techniques. The performance of imputation for rare variants has not been comprehensively studied. We utilized 8865 human samples with high depth resequencing data for the exons and flanking regions of 202 genes and Genome-Wide Association Study (GWAS) data to characterize the performance of genotype imputation for rare variants. We evaluated reference sets ranging from 100 to 3713 subjects for imputing into samples typed for the Affymetrix (500K and 6.0) and Illumina 550K GWAS panels. The proportion of variants that could be well imputed (true r2>0.7) with a reference panel of 3713 individuals was: 31% (Illumina 550K) or 25% (Affymetrix 500K) with MAF (Minor Allele Frequency) less than or equal 0.001, 48% or 35% with 0.001<MAF< = 0.005, 54% or 38% with 0.005<MAF< = 0.01, 78% or 57% with 0.01<MAF< = 0.05, and 97% or 86% with MAF>0.05. The performance for common SNPs (MAF>0.05) within exons and flanking regions is comparable to imputation of more uniformly distributed SNPs. The performance for rare SNPs (0.01<MAF< = 0.05) was much more dependent on the GWAS panel and the number of reference samples. These results suggest routine use of genotype imputation for extending the assessment of common variants identified in humans via targeted exon resequencing into additional samples with GWAS data, but imputation of very rare variants (MAF< = 0.005) will require reference panels with thousands of subjects

    A FUSE Survey of Interstellar Molecular Hydrogen in the Small and Large Magellanic Clouds

    Get PDF
    We describe a moderate-resolution FUSE survey of H2 along 70 sight lines to the Small and Large Magellanic Clouds, using hot stars as background sources. FUSE spectra of 67% of observed Magellanic Cloud sources (52% of LMC and 92% of SMC) exhibit absorption lines from the H2 Lyman and Werner bands between 912 and 1120 A. Our survey is sensitive to N(H2) >= 10^14 cm^-2; the highest column densities are log N(H2) = 19.9 in the LMC and 20.6 in the SMC. We find reduced H2 abundances in the Magellanic Clouds relative to the Milky Way, with average molecular fractions = 0.010 (+0.005, -0.002) for the SMC and = 0.012 (+0.006, -0.003) for the LMC, compared with = 0.095 for the Galactic disk over a similar range of reddening. The dominant uncertainty in this measurement results from the systematic differences between 21 cm radio emission and Lya in pencil-beam sight lines as measures of N(HI). These results imply that the diffuse H2 masses of the LMC and SMC are 8 x 10^6 Msun and 2 x 10^6 Msun, respectively, 2% and 0.5% of the H I masses derived from 21 cm emission measurements. The LMC and SMC abundance patterns can be reproduced in ensembles of model clouds with a reduced H2 formation rate coefficient, R ~ 3 x 10^-18 cm^3 s^-1, and incident radiation fields ranging from 10 - 100 times the Galactic mean value. We find that these high-radiation, low-formation-rate models can also explain the enhanced N(4)/N(2) and N(5)/N(3) rotational excitation ratios in the Clouds. We use H2 column densities in low rotational states (J = 0 and 1) to derive a mean kinetic and/or rotational temperature = 82 +/- 21 K for clouds with N(H2) >= 10^16 cm^-2, similar to Galactic gas. We discuss the implications of this work for theories of star formation in low-metallicity environments. [Abstract abridged]Comment: 30 pages emulateapj, 14 figures (7 color), 7 tables, accepted for publication in the Astrophysical Journal, figures 11 and 12 compressed at slight loss of quality, see http://casa.colorado.edu/~tumlinso/h2/ for full version

    Genome-wide association of white blood cell counts in Hispanic/Latino Americans: the Hispanic Community Health Study/Study of Latinos

    Get PDF
    Circulating white blood cell (WBC) counts (neutrophils, monocytes, lymphocytes, eosinophils, basophils) differ by ethnicity. The genetic factors underlying basal WBC traits in Hispanics/Latinos are unknown. We performed a genome-wide association study of total WBC and differential counts in a large, ethnically diverse US population sample of Hispanics/Latinos ascertained by the Hispanic Community Health Study and Study of Latinos (HCHS/SOL). We demonstrate that several previously known WBC-associated genetic loci (e.g. the African Duffy antigen receptor for chemokines null variant for neutrophil count) are generalizable to WBC traits in Hispanics/Latinos. We identified and replicated common and rare germ-line variants at FLT3 (a gene often somatically mutated in leukemia) associated with monocyte count. The common FLT3 variant rs76428106 has a large allele frequency differential between African and non-African populations. We also identified several novel genetic loci involving or regulating hematopoietic transcription factors (CEBPE-SLC7A7, CEBPA and CRBN-TRNT1) associated with basophil count. The minor allele of the CEBPE variant associated with lower basophil count has been previously associated with Amerindian ancestry and higher risk of acute lymphoblastic leukemia in Hispanics. Together, these data suggest that germline genetic variation affecting transcriptional and signaling pathways that underlie WBC development and lineage specification can contribute to inter-individual as well as ethnic differences in peripheral blood cell counts (normal hematopoiesis) in addition to susceptibility to leukemia (malignant hematopoiesis)

    Genome-wide association study of red blood cell traits in Hispanics/Latinos: The Hispanic Community Health Study/Study of Latinos

    Get PDF
    Prior GWAS have identified loci associated with red blood cell (RBC) traits in populations of European, African, and Asian ancestry. These studies have not included individuals with an Amerindian ancestral background, such as Hispanics/Latinos, nor evaluated the full spectrum of genomic variation beyond single nucleotide variants. Using a custom genotyping array enriched for Amerindian ancestral content and 1000 Genomes imputation, we performed GWAS in 12,502 participants of Hispanic Community Health Study and Study of Latinos (HCHS/SOL) for hematocrit, hemoglobin, RBC count, RBC distribution width (RDW), and RBC indices. Approximately 60% of previously reported RBC trait loci generalized to HCHS/SOL Hispanics/Latinos, including African ancestral alpha- and beta-globin gene variants. In addition to the known 3.8kb alpha-globin copy number variant, we identified an Amerindian ancestral association in an alpha-globin regulatory region on chromosome 16p13.3 for mean corpuscular volume and mean corpuscular hemoglobin. We also discovered and replicated three genome-wide significant variants in previously unreported loci for RDW (SLC12A2 rs17764730, PSMB5 rs941718), and hematocrit (PROX1 rs3754140). Among the proxy variants at the SLC12A2 locus we identified rs3812049, located in a bi-directional promoter between SLC12A2 (which encodes a red cell membrane ion-transport protein) and an upstream anti-sense long-noncoding RNA, LINC01184, as the likely causal variant. We further demonstrate that disruption of the regulatory element harboring rs3812049 affects transcription of SLC12A2 and LINC01184 in human erythroid progenitor cells. Together, these results reinforce the importance of genetic study of diverse ancestral populations, in particular Hispanics/Latinos

    Genome-wide Association Study of Platelet Count Identifies Ancestry-Specific Loci in Hispanic/Latino Americans

    Get PDF
    Platelets play an essential role in hemostasis and thrombosis. We performed a genome-wide association study of platelet count in 12,491 participants of the Hispanic Community Health Study/Study of Latinos by using a mixed-model method that accounts for admixture and family relationships. We discovered and replicated associations with five genes (ACTN1, ETV7, GABBR1-MOG, MEF2C, and ZBTB9-BAK1). Our strongest association was with Amerindian-specific variant rs117672662 (p value = 1.16 × 10−28) in ACTN1, a gene implicated in congenital macrothrombocytopenia. rs117672662 exhibited allelic differences in transcriptional activity and protein binding in hematopoietic cells. Our results underscore the value of diverse populations to extend insights into the allelic architecture of complex traits

    Genetic Diversity and Association Studies in US Hispanic/Latino Populations: Applications in the Hispanic Community Health Study/Study of Latinos

    Get PDF
    US Hispanic/Latino individuals are diverse in genetic ancestry, culture, and environmental exposures. Here, we characterized and controlled for this diversity in genome-wide association studies (GWASs) for the Hispanic Community Health Study/Study of Latinos (HCHS/SOL). We simultaneously estimated population-structure principal components (PCs) robust to familial relatedness and pairwise kinship coefficients (KCs) robust to population structure, admixture, and Hardy-Weinberg departures. The PCs revealed substantial genetic differentiation within and among six self-identified background groups (Cuban, Dominican, Puerto Rican, Mexican, and Central and South American). To control for variation among groups, we developed a multi-dimensional clustering method to define a “genetic-analysis group” variable that retains many properties of self-identified background while achieving substantially greater genetic homogeneity within groups and including participants with non-specific self-identification. In GWASs of 22 biomedical traits, we used a linear mixed model (LMM) including pairwise empirical KCs to account for familial relatedness, PCs for ancestry, and genetic-analysis groups for additional group-associated effects. Including the genetic-analysis group as a covariate accounted for significant trait variation in 8 of 22 traits, even after we fit 20 PCs. Additionally, genetic-analysis groups had significant heterogeneity of residual variance for 20 of 22 traits, and modeling this heteroscedasticity within the LMM reduced genomic inflation for 19 traits. Furthermore, fitting an LMM that utilized a genetic-analysis group rather than a self-identified background group achieved higher power to detect previously reported associations. We expect that the methods applied here will be useful in other studies with multiple ethnic groups, admixture, and relatedness
    • 

    corecore